Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices

نویسندگان

  • Tingxing Dong
  • Azzam Haidar
  • Stanimire Tomov
  • Jack J. Dongarra
چکیده

A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications

Low-rank approximation of large and/or sparse matrices is important in many applications, and the singular value decomposition (SVD) gives the best low-rank approximations with respect to unitarily-invariant norms. In this paper we show that good low-rank approximations can be directly obtained from the Lanczos bidiagonalization process applied to the given matrix without computing any SVD. We ...

متن کامل

The irlba Package

The irlba package provides a fast way to compute partial singular value decompositions (SVD) of large sparse or dense matrices. Recent additions to the package can also compute fast partial symmetric eigenvalue decompositions and principal components. The package is an R implementation of the augmented implicitly restarted Lanczos bidiagonalization algorithm of Jim Baglama and Lothar Reichel. S...

متن کامل

Evaluation of a Fast Algorithm for the Eigen-Decomposition of Large Block Toeplitz Matrices with Application to 5D Seismic Data Interpolation

We present a fast 5D (frequency and 4 spatial axes) reconstruction method that uses Multichannel Singular Spectrum Analysis / Cazdow algorithm. Rather than embedding the 4D spatial volume in a Hankel matrix, we propose to embed the data into a block Toeplitz form. Rank reduction is carried out via Lanczos bidiagonalization with fast block Toeplitz matrix-times-vector multiplications via 4D Fast...

متن کامل

New Fast and Accurate Jacobi Svd Algorithm: Ii. Lapack Working Note 170

This paper presents new implementation of one–sided Jacobi SVD for triangular matrices and its use as the core routine in a new preconditioned Jacobi SVD algorithm, recently proposed by the authors. New pivot strategy exploits the triangular form and uses the fact that the input triangular matrix is the result of rank revealing QR factorization. If used in the preconditioned Jacobi SVD algorith...

متن کامل

New Fast and Accurate Jacobi SVD Algorithm. II

This paper presents new implementation of one–sided Jacobi SVD for triangular matrices and its use as the core routine in a new preconditioned Jacobi SVD algorithm, recently proposed by the authors. New pivot strategy exploits the triangular form and uses the fact that the input triangular matrix is the result of rank revealing QR factorization. If used in the preconditioned Jacobi SVD algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017